PANGLOSS: Knowledge-Based Machine Translation

نویسنده

  • Eduard H. Hovy
چکیده

The goals of the PANGLOSS project are to investigate and develop a new-generation knowledge-based interlin-gual machine translation system, combining symbolic and statistical techniques. The system is to translate newspaper texts in arbitrary domains (though a specific financial domain is given preference) to as high quality as possible using as little human intervention as possible. The project involves three sites (USC/ISI, New Mex-Within PANGLOSS, it is the particular focus of ISI to strive toward large-scale system coverage by investigating the feasibility and utility of combined statistical and human acquisition techniques of grammars, lexicons, and semantic knowledge. To this end, we have acquired several large resources, especially of Japanese lexical information , and are developing methods to integrate this knowledge with the ongoing development of Japanese parsing and semantic analysis and Ontology term acquisition and taxonomization. The most recent ARPA evaluations of several MT systems , including PANGLOSS, are not yet available. However , preliminary measurements indicate that translators performed around 40% more quickly using the system than translating manually (for Spanish to English; the Japanese effort is only 6 months old at this time). In recent work, we have: • continued the construction of the PANGLOSS Ontol-ogy, the taxonomy of terms used in the semantic interlingua representation (the Ontology now contains approx. 50,000 items); • acquired and deployed the lexical analyzer JU-MAN and the parser SAX, with their accompanying 130,000-item wordlist; • acquired a bilingual Japanese-English dictionary of approx. 70,000 entries and fully decoded its contents ; • acquired several other Japanese lexicons of various sizes and amounts of information; • developed algorithms for linking Japanese lexical items to the Ontology; • developed an English lexicon for our Penman sentence generator that contains approx. 70,000 items; • developed several mappers that convert the output of one module of PANGLOSS into the input of another (all these mappers employ the same bottom-up unification-based chart parser); • developed a collection of 200,000 statistically-based rules that govern the inclusion of the articles "the" and "a" into English text without articles (which is how it would come from Japanese). Our major efforts for the next year fall in four areas: 1. Japanese parsing, analysis, and lexis: the continued extension and testing of the current systems and lexicons; 2. Spanish semantic analysis: the development of the current mapper from the NMSU parser output to interlingua form into a more powerful and robust semantic mapper; 3. Ontology …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Workstation Substrate of the Pangloss Project

Pangloss is a new knowledge-based machine translation project carried out jointly by the Center for Machine Translation at Carnegie Mellon University, Computing Research Laboratory of New Mexico State University and Information Sciences Institute of the University of Southern California. One of the distinguishing features of systems built in this project is that they uniformly aim at high-quali...

متن کامل

Building A Large Ontology For Machine Translation

This paper describes efforts underway to construct a largescale ontology to support semantic processing in the PANGLOSS knowledge-base machine translation system. Because we axe aiming at broad sem~tntic coverage, we are focusing on automatic and semi-automatic methods of knowledge acquisition. Here we report on algorithms for merging complementary online resources, in particular the LDOCE and ...

متن کامل

In-Depth Knowledge-Based Machine Translation

The development of ap integrated knowledge-based machine-aided translation system called PANGLOSS in collaboration with the Center for Machine 'Ikanslation (CMT) at CMU and the Computing Research Laboratory (CRL) at New Mexico State University. The IS1 part of the collaboration is focused initially on providing the system's output capabilities, primarily in English and then in other languages, ...

متن کامل

Pangloss: A Machine Translation Project

The project involves three sites (NMSU, USC, CMU) and is devoted to enhancing the state of the art in machine translation of natm'al language texts. Pangloss uses a hybrid, multi-engine approach, though knowledge-based machine translation takes a majority of resources. Types of work in the knowledge-based direction include: • continuing development of a set of knowledge acquisition tools and ut...

متن کامل

A Comprehensive Review about Machine Translation and Ontologies

A comprehensive review about machine translation and ontologies is meant to investigate both the topic of ontologies for MT and MT for ontologies. The former identifies a research area (’80s-’90s) dealing with interlingual knowledge-based MT systems. A general background is presented, together with two case studies, namely PANGLOSS and Mikrokosmos. The latter is a topic appeared recently with r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994